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Abstract 

The mining of frequent subgraphs from labeled graph data has been 
studied extensively. Furthermore, much attention has recently been paid 
to frequent pattern mining from graph sequences. A method, called 
GTRACE, has been proposed to mine frequent patterns from graph se- 
quences under the assumption that changes in graphs are gradual. Al- 
though GTRACE mines the frequent patterns efficiently, it still needs 
substantial computation time to mine the patterns from graph sequences 
containing large graphs and long sequences. In this paper, we propose 
a new version of GTRACE that enables efficient mining of frequent pat- 
terns based on the principle of a reverse search. The underlying concept 
of the reverse search is a general scheme for designing efficient algorithms 
for hard enumeration problems. Our performance study shows that the 
proposed method is efficient and scalable for mining both long and large 
graph sequence patterns and is several orders of magnitude faster than 
the original GTRACE. 

1 Introduction 

Studies on data mining have established many approaches for finding character- 
istic patterns from a variety of structured data. Graph mining, which efficiently 
mines all subgraphs appearing more frequently than a given threshold from a set 
of graphs, focuses on the topological relations between vertices in the graphs [5J. 
AGM [TU], gSpan Q]5], and Gaston [IB] mine frequent subgraphs, starting with 
those of size 1, using the anti-monotonicity of the support values. Although the 
major algorithms for graph mining are quite efficient in practice, they require 
substantial computation time to mine complex frequent subgraphs, owing to 
the NP-completeness of subgraph isomorphism matching [5]. Accordingly, the 
conventional methods are not suitable for very complex graphs, such as graph 
sequences. 

Graph sequences, however, are used extensively to model objects in many 
real- world applications. For example, a human network can be represented as a 
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graph, where a human and the relationship between two humans correspond to 
a vertex and an edge, respectively. If a human joins (or leaves) the community in 
the human network, the numbers of vertices and edges in the graph increase (or 
decrease). Similarly, a gene network consisting of genes and their interactions 
produces a graph sequence in the course of its evolutionary history by acquiring 
new genes, deleting genes, and mutating genes. 

Recently, much attention has been paid to relevant frequent pattern min- 
ing from graph sequences [TT] (dynamic graphs [T] or evolving graphs [3]). 
Figure Ha) shows an example of a graph sequence containing 4 steps and 5 
vertex IDs, denoted by the numbers attached to the vertices. The problem 
we address in this paper is how to mine patterns, as shown in Fig. [TJb), that 
appear more frequently than a given threshold from a set of graph sequences. 
In [TT], Inokuchi and Washio proposed transformation rules (TRs) for repre- 
senting graph sequences compactly under the assumption that the change in 
each graph in the graph sequence is gradual. In other words, only a small 
part of the structure changes, while the other part remains unchanged between 
two successive graphs gk>) and g^ +1 ' in the graph sequence. For example, the 
change between successive graphs gU> and the graph sequence shown 

in Fig. [5] is represented as an ordered sequence of two TRs (OT&LaK?] « .,). 
This sequence of TRs implies that a vertex with vertex ID 1 and label A is 
inserted (vi), and then the edge between the vertices with vertex IDs 2 and 3 
is deleted (ed). By assuming that the change in each graph is gradual, we can 
represent a graph sequence compactly even if the graph in the graph sequence 
has many vertices and edges. Based on this idea, Inokuchi and Washio pro- 
posed a method, called GTRACE (graph transformation sequence mining), for 
efficiently mining all frequent patterns, called relevant FTSs (frequent transfor- 
mation subsequences), from ordered sequences of TRs jllj . In a similar manner 
to PrefixSpan [19], GTRACE first recursively mines FTSs, appending a TR 
to the tail of the mined FTS, and then removes irrelevant FTSs during post- 
processing. Since most of the FTSs mined from graph sequences by GTRACE 
are irrelevant, if we mine only relevant FTSs from the graph sequences, we can 
greatly reduce the computation time for this mining process, thus enabling it 
to be applied to graph sequences containing large graphs and long sequences. 

Our objective graph sequence is more general than both the dynamic graph 
and evolving graph, and GTRACE and the proposed method in this paper are 
applicable to both dynamic graphs and evolving graphs, although methods [4] [3] 

lr The relevancy of frequent patterns is defined in Section 2.2. 
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Figure 1: Examples of a graph sequence and subgraph subsequence for mining. 
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Figure 2: Change between two successive graphs. 

for mining relevant frequent patterns are not applicable to graph sequences. 
In [3] , Borgwardt et al. proposed a method for mining relevant frequent patterns 
from a graph sequence represented by a dynamic graph. They assumed that 
the number of edges in a dynamic graph increases and decreases, while the 
number of vertices remains constant. They also assumed that labels assigned 
to vertices in the dynamic graph do not change and that no labels are assigned 
to edges. On the other hand, Berlingerio et al. proposed a method to mine 
relevant frequent patterns from a graph sequence represented by an evolving 
graph [3! . They assumed that the numbers of vertices and edges in an evolving 
graph increase, but do not decrease, and that labels assigned to vertices and 
edges in the dynamic graph do not change. In addition, a vertex in an evolving 
graph always comes with an edge connected to the vertex. 

In this paper, we propose a new version of GTRACE that enables more effi- 
cient mining of only relevant FTSs based on the principle of a reverse search [2] . 
Our performance study shows that the proposed method is efficient and scalable 
for mining both long and large graph sequence patterns, and is several orders 
of magnitude faster than the original GTRACE. 

1.1 Frequent Graph Mining 

Graph mining is the task of finding novel, useful, and "understandable" graph- 
theoretic patterns in a graph representation of data [5] . Frequent graph mining 
is a representative task in graph mining that efficiently mines all subgraphs that 
appear more frequently than a given threshold from a set of labeled graphs. 

A graph database DB is a set of tuples (gid,g), where gid is a graph ID 
and g is a labeled graph. A tuple (gid, g) is said to contain a graph p, if p is 
a subgraph of g, i.e., p C g. The support of graph p in database DB is the 
number of tuples in the database containing p, i.e., a{p) — \{gid | ((gid,g) € 
DB) A (p C g)}\. Given a positive integer a' as the support threshold, a graph 
p is called a "frequent subgraph" pattern in the graph database DB, if at least 
a' tuples in the database contain p, i.e., a(p) > a'. Representative methods for 
frequent graph mining, such as AGM [TU], gSpan Q]5], and Gaston [TO], mine 
frequent subgraphs starting with those of size 1, using the anti-monotonicity of 
the support values. 

We briefly review gSpan, because it is used to implement the method pro- 
posed in this paper. Here, for the sake of simplicity, we assume that edges 
in graphs have labels, whereas vertices in the graphs do not. Given a fre- 
quent pattern p with n vertices and k edges, vertices in p are traversed in a 
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(a) (b) (c) (d) 

Figure 3: Depth- first traversal for a frequent pattern. 



Table f: DFS Codes for Fig. [3] 



edge k 


a (Fig. Eft)) 


(3 (Fig. Etc)) 


7 (Fig. Eld)) 


1 


(1,2, a) 


(l,2,d) 


(1,2, a) 


2 


(2,3,6) 


(2,3,6) 


(2, 3, a) 


3 


(3,1, a) 


(3, 4, a) 


(3,1,6) 


4 


(3,4,c) 


(4, 2, a) 


(3,4,d) 


5 


(4,2,6) 


(3,5,c) 


(3,5,6) 


6 


(2,5,d) 


(5,2,6) 


(5,l,c) 



depth first manner to assign vertex IDs from v\ to v„. The starting vertex 
and the last visited vertex in the traversal are called the root v\ and the right- 
most vertex v n , respectively, while the straight path from v\ to v n is called 
the rightmost path. According to the traversal, p is represented by DFS code 
consisting of triplets (u, u' , I), where I is an edge label between vertices v u and 
v u r (1 < u,u' < n). The linear order of the DFS codes is defined as follows. For 
DFS codes a = [a\, a^, • • • , a^) and /3 = (61, 6 2 , • ■ ■ , 6/j), a < ft, iff either of the 
following conditions is true: 

• 3t, 1 < t < min(k, h),a q = b q for q < t,at ^ e b t 

• a q = b q for < q < k, and k < h 

where -< e is the linear order among the triplets (u, u' ', I). Since there are many 
DFS codes for an identical p, the minimal DFS code of the DFS codes repre- 
senting p is defined as the canonical code for p. 

Example 1 Table [7] gives the DFS codes for the different traversals shown in 
Fig. \B(b)-(d) of the frequent subgraph p depicted in Fig. \3\(a). According to the 
linear order of the DFS codes, 7 -< a -< (3, and 7 is the canonical code for p. 

A frequent subgraph p is extended based on the pattern-growth principle. 
Given a DFS code a = {ax,a%,- ■■ , afc) for p with k edges, p is extended by 
adding a new edge to obtain a new pattern with k + 1 edges, which is repre- 
sented by a' = (oi, 02, ■ • • , a^, afc+i). The new edge can be added between the 
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rightmost vertex and other vertices on the rightmost path (backward extension), 
or it can introduce a new vertex and connect to vertices on the rightmost path 
(forward extension) . Overall, new edges are only added to the vertices along the 
rightmost path. With this restricted extension, gSpan reduces the generation 
of the same frequent subgraphs. However, it still guarantees the completeness 
of enumerating all frequent subgraphs. 

2 Mining Graph Sequences 
2.1 Representation 

In this section, we briefly review the compilation used to represent graph se- 
quences compactly in GTRACE. Figure [IJa) shows an example of a graph se- 
quence. Graph gd) is the j-th labeled graph in the sequence. The problem we 
address in this paper is how to mine patterns that appear more frequently than 
a given threshold from a set of graph sequences. In Inokuchi and Washio 
proposed TRs to represent graph sequences compactly under the assumption 
that "the change over successive graphs is gradual" . In other words, only a 
small part of the graph changes between two successive graphs and gV +1 ' 
in a graph sequence, while the other parts remain unchanged. In the afore- 
mentioned human and gene networks, this assumption certainly holds, because 
most of the changes in vertices are progressive over successive steps. A direct 
representation of a graph sequence is not compact, because many parts of the 
graph remain unchanged over several steps and are therefore redundant in the 
representation. On the other hand, a graph sequence can be compactly repre- 
sented by introducing a representation of graph transformation based on rules 
for insertion, deletion, and relabeling of vertices and edges under the gradual 
change assumption. 

A labeled graph g is represented as g = (V, E, L, /), where V = {vi, ■■■ ,v z } 
is a set of vertices, E = {(v,v') | (v, v') € V x V} is a set of edges, and L is 
a set of labels such that / : V U E — > L. A graph sequence is represented as 
d = (g^ 1 ' 9^ • ■ •5 < " n ')- We assume that each vertex v is mutually distinct from 
the others in g^) and has a vertex ID id(v) in d. We define the set of vertex 
IDs to be IDv(d) = {id(v) \ v G V(g^'),g^ G d} and the set of pairs of vertex 
IDs to be ID E (d) = {(id{v),id(v')) | (v,v') G E(g^),g^ G d}. For example, 
in the human network mentioned in Section [TJ each person has a vertex ID, and 
his/her gender is an example of a vertex label. To represent a graph sequence 
compactly, we focus on the differences between two successive graphs gW> and 
<jrU+i) m the sequence. 

Definition 1 Given a graph sequence d = {g^ • ■ ■ g^ n '), the differences between 
gv) and g^ +1 ^ are interpolated by a virtual sequence d^ — (g^' 1 ' ■ ■ ■ g^'" 1 ^), 
where g^' 1 ^ — g^' and g^' m ^> = g^ +1 \ such that the edit distance }18tf between 
any two successive graphs is 1, and in which the edit distance between any two 
intrastates is the minimum. Therefore, d is represented by the interpolations as 
d= (dW---^™- 1 )). ■ 
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Table 2: Transformation rules (TRs) representing graph sequence data. 



Vertex Insertion 


Insert a vertex with label I and vertex 
ID u into g^' k ' to transform to <jrC?> fe + 1 ). 


Vertex Deletion 


Delete an isolated vertex with vertex 
ID u in g(-?' fe ) to transform to gti> k+1 ). 


Vertex Relabeling 

Vr \u,l] 


T~\ 111 i * 1 1 ■ I 1 — \ • 

Kelabel a vertex with vertex ID u in 
<j,C/,fc) to be Z to transform to g(J' k+1 ). 


Edge Insertion 

[(lll,«2),(] 


Insert an edge with label Z between 2 
vertices with vertex IDs u\ and ui into 
g(j>fe) to transform to g(^ k+1 \ 


Edge Deletion 
[(«i,« 2 ),»] 


Delete an edge between 2 vertices with 
vertex IDs u\ and in gti' k ) to 
transform to gO'*^ 1 ) . 


Edge Relabeling 

er [(«!,« 2 ),!] 


Relabel an edge between 2 vertices 
with vertex IDs u\ and U2 in to be 
I to transform to gV> k+1 >. 



Since the transformations of vertex deletion vd and edge 
deletion ed do not assign any labels to the vertex and the 
edge, respectively, they have dummy arguments Z, 
represented by 



We call <jrW and gU'ty an interstate and intrastate, respectively. The order of 
interstates represents the order of graphs in a sequence. On the other hand, the 
order of intrastates is the order of graphs in the artificial interpolation. 
The transformation is represented by the following TR. 

Definition 2 A TR that transforms g^' k ^> to <jrO'ifc+l) is expressed as ^ r [o ^Y fc ]> 
where 

• tr is a transformation type, which is either insertion, deletion, or relabeling 
of a vertex or an edge, 

• Ojk is an element in IDy (d) U IDe (d) to be transformed, and 

• Ijk € L is the label to be assigned to the element by the transformation. M 

(i k) ( j k) 

For the sake of simplicity, we denote the TR tr^' ; . fc ] as ^ r [oi] by omitting 
the subscripts for Ojk and Ijk, except where this is likely to cause ambiguity. 
In [TT], Inokuchi and Washio introduced six TRs as defined in Table [2] In 
summary, we give the following definition of a transformation sequence. 

Definition 3 An intrastate sequence d^' = {g^' 1 ^ ■ • ■ gv< m i)} is represented by 
an "intrastate transformation sequence" s^p — ■ ■ •^ff'IT^ )■ Moreover, 
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a graph sequence d = (g 1 -- 1 " 1 ■ ■ ■ g^) is represented by an "interstate transforma- 



tion sequence" Sd 



>-i)\ 



The notion of an interstate transformation sequence is far more compact than 
the original graph based representation, because only the differences between 
two successive interstates appear in the sequence. In addition, in our case, 
computing a sequence of TRs based on differences between two graphs is solvable 
in linear time because all vertices have vertex IDs. 

Example 2 In Fig.^a), a graph sequence is expressed as a sequence of inser- 
tions and deletions of vertices and edges as shown in Fig.^b). The sequence 

is compiled as {viu'^vi^ q^r^i) -] et 4?2 3) •] v( tf%'t] e ^[fi3) »] v( ^[i'»] \)> w ^ ere "~" 
denotes the label of an edge. 



2.2 Mining Relevant Frequent Transformation Subsequences 

In this section, we briefly review how GTRACE mines rFTSs (relevant frequent 
transformation subsequences) from a given set of graph sequences. To mine 
rFTSs from a set of transformation sequences, we define an inclusion relation 
between transformation sequences. 

Definition 4 Given a transformation sequence s p of a pattern and a trans- 
formation sequence Sd of a data graph sequence d, s p is a subsequence of Sd, 
denoted as s p Csj, iff there is a pair of infective functions (0, ip) satisfying 

• there exist integers 1 < 0(1) < 0(2) < • • • < 0(n) < m, 

• there exist integers ip( u ) 6 IDy(d) for vertex IDs u in s p , and 



• V tr 



^ £ s p ^ 3k', tr^J''" ' 6 s d , where d = i\){u), if the TR 

^ r [o'i] ^ s p t rans f orm s a vertex with vertex ID u. On the other hand, 
o' = {ip{u\),^{u2)), if the TR transforms an edge with vertex IDs u\ and 

U2- ■ 



(AKg) 

„< 21 



(a) An example of a graph sequence d. 



VI ' 



vu [2,»] [(1,3},» 



®® © 



§®© 



® © 



® © 



g g g g 

(b) Representation of the graph sequence d by using TRs. 



Figure 4: A graph sequence with its transformation rules. 
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The first condition in Definition 2] states that <j> preserves the order among in- 
trastate transformation sequences, while the second condition states an injective 
mapping from vertex IDs in s p to vertex IDs in Sd- In addition, the third con- 
dition states that a TR corresponding to any TR in s p must exist in Sd- The 
complexity of finding an occurrence of s p in Sd is identical to that of subgraph 
isomorphism matching. 

Example 3 Given the graph sequence d in Fig. \5j(a) represented by the trans- 
formation sequence s d = (Wr^Ws.cW 

the transformation sequence s' d — (wip'^eirfj 1 ^ _] e ^fn 2 2) °f the graph 

sequence d' in Fig. \B(b) is a subsequence of Sd, and the TRs in s' d match 
the underlined rules in via <fi(j) — j for j £ {1,2} and ip(i) = i + 1 for 
i£lD v (d') = {1,2,3}. 

To mine FTSs consisting of mutually relevant vertices only, the relevancy of 
vertices and edges is defined as follow^. 

Definition 5 Vertex IDs in a graph sequence d = (g^ ■ ■ ■ g^) are relevant to 
one another, and d is called a "relevant graph sequence" , if the union graph g u (d) 
of d is a connected graph. We define the union graph of d to be g u (d) = {V u , E u ), 
where V u — IDy(d) and E u = IDE{d). ■ 

Similar to Definition[5j we define the union graph of the transformation sequence 

Sd- 

Definition 6 The union graph g u (sd) = (V U ,E U ) of a transformation sequence 
Sd is similarly defined as 

V u = {u\ tr^ G s d ,tr G {vi,vd,vr}} 

U{u, u' | tr^ ul) l] G s d ,tr G {ei, ed, er}}, 
E u = {{u,u) | tr^ u k) u , )A G s d ,tr £ {ei,ed, er}}. ■ 



Example 4 Figure\0(b ) shows the union graph of the graph sequence depicted in 
Fig, fffifl). In addition, the union graph of a transformation sequence (eifo^ _i 
e [(2 3) -]) * s identical to the graph shown in Fig. \^(b). 

Given a set of data DB = {{gid, d) \ d = (gW ■ ■ ■ g^)}, the support value 
cr(sp) of a transformation subsequence s p is given as 

a ( s p) = {{did I {gid.d) G DB,s p C s d }\, 
2 See 1111 1121 for the detail motivations for mining rFTSs. 
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(a) A graph sequence d. 
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(b) A subsequence oft/. 



Figure 5: Inclusion relation. 
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(a) A graph sequence d 



(b) The union graph s u 

id) 



Figure 6: Union graph. 



where Sd is the transformation sequence of d. We call a transformation subse- 
quence with support value greater than or equal to a minimum support thresh- 
old a' a "frequent transformation subsequence" (FTS). The anti-monotonicity 
of this support value holds; that is, if s\ C si then o{s\) > o~(s-i). Using these 
definitions, we state our mining problem as follows. 

Problem 1 Given a dataset DB = {(gid,d) \ d = (gM ■ ■ ■ g {n) )} and a mini- 
mum support threshold a' as input, enumerate all rFTSs. 

Example 5 Figure [?] shows a graph sequence database DB that contains two 
graph sequences and nine rFTSs mined from the compiled graph sequences in DB 
under a' — 2. In this example, (wfi'^wp's]) an d ("'[ib]"''^'!]) are common 
subsequences of the compiled graph sequences not mined from DB, because their 
union graphs are not connected. 

To enumerate all rFTSs efficiently, GTRACE recursively mines FTSs by 
appending a TR to the tail of the current FTS, in a similar manner to Pre- 
fixSpan |17) , which is a representative method for mining frequent subsequences 
from a set of itemset sequences. After mining all the FTSs, GTRACE removes 
all FTSs whose union graphs are not connected, thus outputting only rFTSs. 

Example 6 Figure shows the detailed procedure for mining FTSs up to sg 
using GTRACE. After mining an FTS Si, GTRACE recursively appends a TR 
to Si to mine a longer FTS Si+i. After mining all FTSs, S2, S3, and S4 are 
removed during post-processing, because their union graphs are not connected. 
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■(3,1) -(4,1) \ 



\ yl [2,A] K *[l,S] t -*[(I,2),-] / 

Mining rFTSs 



«3> > 



Figure 7: Mining rFTSs from DB. 



Figure 8: Mining procedure using GTRACE. 



2.3 Drawback of GTRACE 

In the previous subsection, we briefly reviewed the problem of graph sequence 
mining and GTRACE for mining rFTSs from graph sequences. GTRACE first 
mines a set of FTSs containing a complete set of rFTSs, and then removes any 
FTS that is not relevant from the set of mined FTSs during post-processing. 
Since most of the FTSs mined in the first step are not rFTSs, excessive com- 
putation time is needed to mine the vast set of FTSs, causing GTRACE to be 
highly inefficient. 

For example, the FTS s§ shown in Fig. [8] is relevant, because its union 
graph is connected. To mine this rFTS, GTRACE mines s±, S2, S3, S4, and 
S5 in order appending a TR to the tail of Sj to obtain a new FTS Sj+i, where 
i = 2, ■ • • , 5. Then, S2, S3, and S4 are removed during post-processing, because 
they are irrelevant. Therefore, to mine all rFTSs using GTRACE, a vast set 
of FTSs, most of which are not rFTSs, first needs to be mined, resulting in 
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JA} {Bj [q] [D] 

{A, B} {A, C} {A, D} {B, C} [B, D) (C, D} 

{A, B, C] {A, B, D} (A, C, D] {B, C, D} 
t 

[A, B, C, D] 

Figure 9: Enumeration Tree for items {A, B, C, D}. 



inefficient execution of GTRACE. In our experiment, 95% of FTSs mined by 
GTRACE are irrelevant. By designing a new algorithm that mines only rFTSs, 
we can reduce the computation time for mining the complete set of rFTSs. In 
the next section, we propose a new method for mining only rFTSs from a set of 
graph sequences based on the principle of a reverse search. 



3 Proposed Method 

3.1 Reverse Search 

To mine all rFTSs efficiently, we propose a method for mining only rFTSs based 
on a reverse search (2J [1]. The underlying concept of the reverse search is a 
general scheme for designing efficient algorithms for hard enumeration problems. 
Let S be the set of solutions. In a reverse search, we define the parent-child 
relation P £ S x S such that each x € S has a unique parent P(x) <E S. Using 
the search tree T over S defined by P, we can enumerate all solutions of S 
without duplicates by traversing the search tree T from the root to the leaves. 
Therefore, if we can define the parent-child relation P such that each igS has 
a "unique" parent, we can enumerate a complete set of elements in S. 

For example, given a set of items / = • • • , and a set of transactions 
D = {t | t C /}, the frequent itemset mining problem is to enumerate all 
frequent itemsets S = {x \ x C I,a(x) > a', a(x) = \{t \ x C t, t S -D}|}, where 
a 1 is the minimum support threshold. To enumerate the complete set of frequent 
itemsets efficiently, the parent-child relation P is defined as the unique itemset 
x 1 = {ii, i2, ■ ■ ■ , ife-i} derived from x — • ■ ■ , ife-i, ifc} by removing the 

last item in x, where the items in x and x' are sorted according to the linear 
order of the items. Starting from the root node corresponding to an empty set, 
traversing the search tree T over S defined by P enables us to enumerate a 
complete set of frequent itemsets without duplicates. In the traversal, P^ 1 (x') 
is used to enumerate itemsets from itemset x' G S by adding an item to x' . 
Figure |9] shows the enumeration tree for the itemset / = {A, B, C, D}, with 2 / 
nodes in the tree. By traversing from the root node in a reverse direction to the 
arrows in the tree, all itemsets are enumerated. 
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However, as mentioned in Section 12. 31 if the last TR in an rFTS Sj (i = 
2, • • • , 6) is removed to derive its parent FTS Sj_i, Sj_i is not always relevant, 
although Si-x is always frequent according to the anti-monotonicity of the sup- 
port values. Therefore, the original GTRACE needs all the FTSs to enumerate 
all the rFTSs, which results in inefficient execution of GTRACE. In this sec- 
tion, to enumerate only rFTSs, we define canonical forms of rFTSs and novel 
parent-child relations P%, P2, and P3 between the canonical rFTSs. 

Definition 7 Given an rFTS s, TRs a^, • • • > a>i in s are removed by either 

Pi, Pi, or P3 in order as defined later. We define a code a — (a\, ■ ■ ■ , a^) for 
s, where vertex IDs in g u (s) are assigned in a depth-first manner similarly to 
gSpan. For codes a — (oi, • • • , a.fc) and fJ — (b±, ■ ■ ■ , bh), a ^ /3, iff either of the 
following conditions is true: 

• 3t, 1 < t < min(k, h),a q = b q for q < t,a t -<tr &t 

• a q — b q for < q < k, and k < h, 

where -< tr is the linear order among TRs tr^'^ . Since there are many represen- 
tations for an identical transformation sequence, the representation correspond- 
ing to the minimal code among the representations of an identical transformation 
sequence is defined as canonical. I 

In Definition[7l P\, P2, and P3 are parent-child relations among canonical rFTSs 
S, which are defined as follows. 

Definition 8 Given an rFTS s £ S containing TRs applied to vertices, we 
define function Pi mapping from s to s' . The transformation sequence s' is 
derived from s by removing the TR located in the last position of all the TRs 
applied to vertices in s. I 

If the length of transformation sequence s is defined as the number of TRs in s, 
the following lemma is obtained. 

Lemma 1 Given an rFTS s G S containing TRs that are applied to vertices 
and with length greater than 1, g u (Pi(s)) — g u (s). B 

Proof 1 The union graph of s is a connected graph, because s is relevant. If 
the vertex ID, to which the TR r removed by P\ is applied, is u, a TR to 
transform an edge, whose terminal vertex ID is u, must exist in s, because g u (s) 
is connected. Therefore, vertex u remains in g u (Pi(s)) after r is removed, and 
the union graph of transformation sequence Pi(s) is isomorphic with g u (s). On 
the other hand, given an rFTS s G S of length 1 containing a TR that is applied 
to a vertex, P\(s) = _L. 

According to Lemma [TJ Pi (s) is always relevant for an rFTS s containing TRs 
applied to vertices. In addition, the transformation sequence Pi(s) is frequent 
according to the anti-monotonicity of the support value and canonical according 
to Definition [7J Therefore, Pl(s) returns a "unique" rFTS in S. 
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Example 7 Given se in F ig. [3J Pi(se) and Pi(P\(sq)) are 

( V *[l,A] V *[ 2 's] e *[(l>2).-] e *[(2 1 3),-] e ^[(2,3),-]) and 
\ W '[l,A] e< [(l,2)H [(2,3),-] [( 2 > 3 )H'' 

respectively. Union graphs of Sq, Pi(sq), and P± (Pi (sq)) are isomorphic to the 
graph shown in Fig. \B(b). 

Next, we define the second function of our parent-child relation to enumerate 
rFTSs. 

Definition 9 Given an rFTS s' 6 S that contains only TRs applied to edges, 
we define function P2 mapping from s' to s" . The transformation sequence s" is 
derived from s' by removing the TR fr^'jj located in the last position of all the 

TRs in s' if a TR tr J, '^,) exists in s 1 such that = 0' and j' < j . M 

If the rFTS s contains TRs that are applied to vertices, we can obtain an rFTS 
s' that does not contain TRs applied to vertices by applying Pi to s multiple 
times. In addition, P2 is applicable to s', if the length of s' is greater than the 
number of edges in g u {s'), since at least one TR needs to be applied to each 
edge in E(g u (s')). According to the above definition, the following lemma is 
obtained. 

Lemma 2 Given an rFTS s' £ S containing only TRs applied to edges and 
whose length is greater than \E(g u (s'))\, g u (P2(s')) is identical to g u (s'). I 

This lemma is proven in the same way as Lemma [TJ According to Lemma [5J 
P2(s') is always relevant for an rFTS s' containing only TRs applied to edges. 
In addition, the transformation sequence P2(s') is frequent according to the 
anti-monotonicity of the support value and canonical according to Definition [7J 
Therefore, P2(s') returns a "unique" rFTS in S. 

Example 8 Given s' 3 = ^fi^l) ,-f d f(2l) ' P 2(4) = ( ei [(i%,-] ei [(2%,-]) ■ 

According to Definition^ we cannot apply P2 to s' 3 . 

Finally, we define the third function of our parent-child relation to enumerate 
rFTSs. 

Definition 10 Given an rFTS s" 6 S where each TR is applied to a mutually 
different edge in g u {s"), we define function P3 mapping from s" to s 1 " . The 
transformation sequence s'" is derived from s" by removing the TR located in 
the last position in s" keeping the connectivity of g u (s"'). M 

If each TR in an rFTS s is not applied to a mutually different edge in g u (s), we 
can obtain an rFTS s" where each TR is applied to a mutually different edge in 
9u{s") by applying Pi and P2 to s multiple times. According to Definition [POl 
p3(s") is always relevant for an rFTS s" where each TR is applied to a mu- 
tually different edge in g u (s"), because the connectivity of g u (P3(s")) is kept. 
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Figure 10: Parent-child relation between rFTSs. 



In addition, the transformation sequence Ps{s") is frequent according to the 
anti-monotonicity of the support value and canonical according to Definition [7J 
Therefore, P 3 (s") returns a "unique" rFTS in S. 

Example 9 Since the rFTS s 6 shown in Fig. \1U\ contains TRs that are applied 
to vertices, P\ is applied to se three times until the transformation sequence 
does not contain TRs applied to vertices, and we obtain s' 5) s'^, and s' 3 . Next, 
Pi is applied to s' 3 once, resulting in s' 2 where each TR is applied to a mutu- 
ally different edge in gu{P2(s' 2 )). Finally, P3 is applied to s' 2 twice until the 
transformation sequence becomes _L 

Wc have defined our parent-child relation between rFTSs in terms of three 
functions Pi, P 2 , and P3. Using the search tree T over S defined by P = 
{P\,P 2 , P 3 }, we can enumerate a complete set of only rFTSs of S by traversing 
search tree T from its root. In the traversal, P 2 l , and P 3 _1 are used to 

enumerate rFTSs from the current rFTS by adding a TR. 



In the previous subsection, wc defined functions Pi, P 2 , and P3 for the parent- 
child relation between rFTSs. By traversing the tree constructed by parent-child 
relations in a reverse direction, we enumerate a complete set of rFTSs without 
enumerating any FTSs that are not also rFTSs. Figure [JTJ gives the pseudo 
code for the proposed method "GTRACE-RS" that mines all rFTSs from graph 
sequences DB and accumulates them in S with the minimum support value a' . 
The method traverses the search tree T over S defined by P = {Pi,P2,P3} in 
a depth-first manner. In Fig. Ill | s p ()r means that a TR r is added to s p such 
that s p ()r e Pj _1 (s p ), where i = 1,2, 3. In this process, a TR to be added is not 
always appended to the tail of s p unlike the original GTRACE. s p ^ min checks 
whether s p has been discovered before, where min is the canonical form of s p . 
First, given an rFTS s p that contains only TRs applied to edges and where each 



3.2 GTRACE-RS 
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Input: an rFTS s p , a dataset DB, a' , and i = 3. 

Output: the set of rFTSs S. 

GTRACE-RS(s p , DB, g g, g 

1: if s p ^ min, then 

2: return; 

3: insert s p into S; 

4: while i > 

5: Call Sub-procedure (s p , DB, a', S, £); 
6: i = i — 1; 
7: return; 



Subproccdure(s p , Z3P, cr', S, i) 

T: set C to 0; 

2: scan 13 B, 

3: hnd all transformation rules r s.t. SpO^ G (s p ); 

4: insert s p r in C and count its frequency; 

5: for each frequent s p r in C do 

6: Call GTRACE-RS(s p r, Z3B, ct', S, i); 

7: return; 

Figure 11: Algorithm for enumerating rFTSs. 



TR is applied to a mutually different edge in g u (s p ), GTRACE-RS enumerates 
rFTSs that are P 3 _1 (sp). Next, given an rFTS that contains only TRs applied 
to edges, GTRACE-RS enumerates rFTSs that are P 2 1 {s p ). Finally, given an 
rFTS, GTRACE-RS enumerates rFTSs that arc Pf 1 (s p ). 

4 Implementation 

In the previous section, we proposed a new method for mining only rFTSs from 
graph sequences. To mine the rFTSs efficiently, an efficient implementation of 
the method is also important. In this section, we explain how to implement the 
proposed method. 

4.1 Implementation of P 3 1 

Given an rFTS s p that contains only TRs applied to edges and where each TR is 
applied to a mutually different edge in g u (s p ), GTRACE-RS enumerates rFTSs 
in -P 3 -1 (Sp) from s p . During this process, the number of edges in g u (P^~ 1 (s p )) in- 
creases one by one, similarly to gSpan [19]. So we have implemented P 3 _1 based 
on gSparfl Since gSpan is an algorithm for mining frequent subgraph patterns 

3 Despite having defined canonical codes of rFTSs and implemented P 3 — 1 using gSpan, we 
could have implemented these similarly using FSG 1 141 or Gaston |16l . which are algorithms 
for solving the frequent graph mining problem. 
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from colored graphfQ, it cannot be applied directly to graph sequences. We ex- 
tend the tuple (u,u',c) of the DFS code used in gSpan to a tuple (u, u', (l,tr,j)) 
as a TR trtyj*^,-, ( -,. An rFTS s p is extended based on the pattern-growth prin- 
ciple. Given a code a — (<zi, • • • , at) for s p , a is extended by adding a tuple 
dk+i in the form of (u,u',(l,tr,j)) to generate a' = (ai, • • • , <Jfc, Ofc+i). A TR, 
represented by the tuple, to transform an edge between vertices u and it' on the 
rightmost path in g u {s P ) can be added (backward extension), or a TR, repre- 
sented by the tuple, to transform an edge between the rightmost vertex u and 
another vertex u' that does not exist in g u (s p ) can be added (forward extension). 

4.2 Projection 

After mining rFTSs using Pf \ we shrink the transformation sequences in DB 
for the sake of computational efficiency We call this procedure the projection. 
When Pf 1 and P 2 ~ 1 grow an rFTS s p by adding TRs one by one, the union 
graphs of any rFTSs generated by Pf 1 and Pf 1 are isomorphic with g u {s P ) 
according to Lemmas Q] and [2] that is, vertex IDs in any TRs added to s p by 
Pf 1 and Pf 1 are identical with vertex IDs in s p (A). In addition, when P 2 _1 

grows an rFTS by adding TRs tr\^^ one by one, there must exist a TR tr^,f,? 
in s p such that o — d and j' < j according to Definition [9] (B). Based on the 
above discussion, we define the projection as follows. 

Definition 11 Given a transformation sequence s^ of (gid,d) £ DB and an 
rFTS s p mined by Pf 1 such that s p C s d via (0, ip), we define the projection of 
s d onto its maximum subsequences s' d satisfying the following conditions. 

(A) MuU(u') I trg% )fi G s p } = {u,u',u" | irf-W G s' d ,tr^\ € s' d }. 

(B) TRs tr^,'^,? G s d such that ip(o) — d and <f>(j) < j' , are included in s' d if 

Ll [o,l] fc b P- m 

Projected transformation sequences are used in the implementation of Pf 1 and 
Pf 2 , as explained in Section POl 

4.3 Implementation of Pf 1 and P x 2 

In the rest of this section, using a concrete example, we explain our implemen- 
tation of Pf 1 and Pf 1 to enumerate rFTSs. In Section 3, we defined Pi and Pi 
separately for ease of explanation. However, in our implementation, Pf 1 and 
Pf 1 are implemented jointly, because of the efficiency of our GTRACE-RS. For 
the sake of simplicity, in this subsection, we assume that DB contains only 
two graph sequences (gidi,di) and {gid^idi) as shown in Fig. Q21 and that an 
rFTS s p has been mined from DB using P 3 _1 under a' = 2. In addition, we 

4 We use the term "color" instead of "label" to distinguish labels of graphs in frequent 
graph mining problems and labels of graph sequences in our problem. 
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Figure 12: Reassignment of vertex IDs (values for k are omitted in this figure). 

consider only two of the mappings from vertex IDs in s p to vertex IDs in the 
transformation sequences Sdi and Sd2, although there are other mappings. The 
two mappings tpx and ^2 are given as 



In this example, vertex ID 1 in s p corresponds to vertex ID 3 in Sd2, because 
^2(1) = 3- Since vertex ID 3 in Sdi and vertex IDs 2 and 3 in Sd2 do not 
correspond to any vertex IDs in s p , the underlined TRs in Sdi and Sd2 in Fig. [12] 
are removed in the projection, and the projected transformation sequences s' dl 
and s' d2 shown in the middle of Fig. [12] are derived. Transformation sequences 
s' dl and s' d2 are used in P^ 1 and KT 1 to grow s p . Therefore, all rFTSs mined 
from s' dl and s' d2 must contain s p as a subsequence. 

Subsequently, to check efficiently whether TRs in s dl and s' d2 correspond to 
each other, we convert the projected transformation sequence s dl and s' d2 by re- 
assigning all vertex IDs in the projected transformation sequences, as described 
below. We are aware of all the embeddings of s p in s' dl and s d2 , as well as their 
mappings from vertex IDs in s p to vertex IDs in s' dl and s' d2 as given by Eqs. ([TJ) 
and @. In this example, we reassign vertex ID 4 in s' dl to vertex ID 3, because 
we know ipi(3) = 4. The reassigned transformation sequences s dl and s' dl are 

shown at the bottom of Fig. [TJ] By reassigning the vertex IDs in s' dl and s' d2 , 

(j) 

corresponding TRs are written in the same representation, except for j in try n. 
For example, a TR vi^m is written in the same representation in s' dl and s d2 , 
although the rule is written as vinm an d v ^[i,b] m s 'di an d s 'd2i respectively. 

Next, because the corresponding rules are written in the same representation, 
we further convert the reassigned transformation sequences s'L and s d2 in Fig. 1121 



V>i(l) = l,<M2) = 2,Vi(3)=4, 
lfe(l) = 3,^(2) - 1,^2(3) =4. 



(1) 
(2) 
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to sequences of itemsets as follows. 

51 = (gidi, ((ii i 2 ) (h H) h H 17)), (3) 

52 = (gid 2 , ((ii i 2 «3) «4 «6 (is id))), (4) 

where Wn,A] m s di ana - s d2 i s converted to i\, win 31 to i 2 , and so on. In 
addition, items in s\ and s 2 are placed in the same parentheses if their corre- 
sponding TRs exist in the same intrastate transformation sequence. For brevity, 
parentheses are omitted if an itemset has only one item. In these sequences of 
itemsets, an underlined item appears in all of the converted sequences as a sub- 
sequence (14 because s — (14 ie) corresponds to s p . To shorten the converted 
sequences, we remove the subsequence s from the converted sequences to obtain 
other converted sequences as follows. 

s' 1 = (gid 1 ,((i< ut * 2 <lst ) * 3 =1S * *f nd *7», 
s 2 = (gid 2 ,{(i< ut i< lst i< ut ) (is i 9 )», 

where if lst denotes that 13 appears before the 1st item in s, and i^ lst denotes 
that 13 appears at the same time as the 1st item in s. Using this representa- 
tion and s, we can re-convert and s 2 to s\ and s 2 , respectively. Now, we 
have two sequences of itemsets s' x and s' 2 . By applying PrefixSpan [17] to the 
sequences of itemsets under a = 2, we obtain three frequent sequential pat- 
terns {<i< ls *>, <«^ ls *>, ((if 13 * i 2 lst ))fi After mining the frequent sequential 
patterns, we re-convert each item in the frequent sequential patterns to a TR to 

obtain three rFTSs: A} e ^({l 2) -] e *[(2 3) -])' ( v ^[2 s] e 4fi 2) .^[p 3) an d 
/ •(!) •(!) -(2) -(3)' \ ' 

\ m [l, J 4] m [2,B] CT [(l,2),-] CT [(2,3),-]/- 

In PrefixSpan, we can check in 0(1) whether TRs in s'L and s'^ 2 correspond 
to each other, because two corresponding TRs in the two reassigned transforma- 
tion sequences have the same vertex IDs. By reassigning vertex IDs in projected 
transformation sequences and converting transformation sequences to sequences 
of itemsets, we avoid graph isomorphism matching between two transformation 
sequences. Since we can quickly check whether two TRs correspond by com- 
paring two items, we can efficiently mine rFTSs from large and long graph 
sequences. 

5 Experiments 

The proposed method was implemented in C-| — h The experiments were exe- 
cuted on an HP Z600 computer with an Intel Xeon X5560 2.80GHz processor, 4 
GB of main memory and running Windows 7. The performance of the proposed 
method was evaluated using artificial and real-world graph sequence data. 
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Table 3: Parameters of the artificial datasets. 



Parameters 



Default values 



Probability of vertex and edge insertions 

in transformation sequences 
Probability of vertex and edge deletions 

in transformation sequences 
Average number of vertex IDs 

in transformation sequences 
Average number of vertex IDs 

in embedded FTSs 
Number of vertex labels 
Number of edge labels 
Number of embedded FTSs 
Number of transformation sequences 
Edge existence probability between vertices 
Average edit distance between interstates 
Minimum support threshold 



Pe =15% 
d-ist =2 
a' = 10% 



Pi = 80% 



Pd = 10% 



\VLvg\ = 3 

\L V \ = 5 
\L e \ = 5 
N = 10 



\DB\ = 1,000 



avg 



= 6 



5.1 Artificial Datasets 

We compared the performance of the proposed method with the original GTRACE [IT] 
using artificial datasets generated from the parameters listed in Table [3] First, 
starting from with |14„ ff |/2 vertices generated with edge existence proba- 
bility p e , we grew each graph sequence to include \V avg \ vertex IDs on average, 
by applying two (= ddist) of (A) inserting with probability pi, (B) deleting with 
probability Pd, and (C) relabeling vertices and edges with probability \ —pi~~Pd 
at each interstate. Accordingly, if is small or \V avg \ is large, the generated 
transformation sequence is long. This process is continued until the sequence 
becomes relevant by increasing the numbers of vertices and edges. We gener- 
ated \DB\ graph sequences. Similarly, we generated N rFTSs with \V^ vg \ vertex 
IDs on average. We then generated the DB in which each graph sequence was 
overlaid by an rFTS with probability 1/N . Each graph sequence contained \L V \ 
vertex labels and \L e \ edge labels. 

Table 2] lists the computation times [sec] , the numbers of rFTSs mined by 
the proposed method, and the numbers of FTSs mined by the first step of the 
original GTRACE for varying values of \DB\, \V avg \, pi, L e , and a', with the 
other parameters set to their default values. In the table, "-" indicates that 
no results were obtained because of intractable computation times exceeding 2 
hours. In addition, PM and GT denote the proposed method and GTRACE, 
respectively. 

The first part of Table [4] shows that the computation time is proportional 
to the number of graph sequences \DB\, as is the case in conventional frequent 

5 Although 17 frequent sequential patterns are mined from si and S2, some of these are not 
rFTSs. We discuss this in detail in Section 6. 
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pattern mining. The second and third parts of the table indicate that the 
computation times for both GTRACE-RS and GTRACE are exponential with 
respect to an increase in the average number of vertices \V avg \ in the graph 
sequences and a decrease in the probability pi of vertex and edge insertions in 
the graph sequence. The main reason that the computation time increases with 
the average length seems to be the increase in the numbers of rFTSs in both 
cases. However, the far superior efficiency of the proposed method compared to 
GTRACE is confirmed by the computation times. The fourth part of Table 0] 
shows the effect of the number of labels on the efficiency When \L e \ is small, 
many transformation subsequence are isomorphic with each other, and thus 
the computation times for GTRACE and GTRACE-RS increase. However, the 
computation time for GTRACE-RS remains smaller, since it mines only rFTSs. 
The fifth part of Table [4] shows that the proposed method is tractable even with 
a low minimum support threshold. 

All parts of Table 0] show that the number of rFTSs mined by the proposed 
method is much smaller than the number of FTSs mined by GTRACE. By min- 
ing only rFTSs, the proposed method efficiently mines a complete set of rFTSs 
from a set of graph sequences. Therefore, the proposed method is applicable in 
practice to graph sequences that are both long and large. 

5.2 Real- World Dataset 

To assess the practicality of the proposed method, it was applied to the Enron 
Email Dataset [TJ [TB] . In the dataset, we assigned a vertex ID to each person 
participating in an email communication, and assigned an edge to a pair com- 
municating via email on a particular day, thereby obtaining a daily graph . 
In addition, one of the labels {CEO, Employee, Director, Manager, Lawyer, 
President, Trader, Vice President} was assigned to each vertex and we labeled 
each edge according to the volume of mail. We then obtained a set of weekly 
graph sequence data, i.e., a DB. The total number of weeks, i.e., number of 
sequences, was 123. We randomly sampled \V\ (= 1 ~ 182) persons to form 
each DB. 

Table [3] shows the computation times (comptime [sec]) and the numbers of 
mined rFTSs or FTSs (# of rFTSs or # of FTSs) obtained for various numbers 
of vertex IDs (persons) |V|, minimum support a', and numbers of interstates 
n in each graph sequence of the dataset. All the other parameters were set to 
the default values indicated at the bottom of the table. Thus, the dataset with 
the default values contained 123 graph sequences each consisting of 182 persons 
(vertex IDs) and 7 interstates. The parameter \l a vg\ =4, 5, 6, or 7 indicates 
that each sequence d in DB consists of 4, 5, 6, or 7 interstates from Monday to 
Thursday, Friday, Saturday, or Sunday, respectively. 

The upper, middle, and lower parts of the table show the practical scalability 
of the proposed method with regard to the number of persons (vertex IDs), the 
minimum support threshold, and the number of interstates in graph sequences 
in the graph sequence database, respectively. The original GTRACE proved 
intractable for the graph sequence dataset generated from the default values, 
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despite the change in each graph in this graph sequence database being gradual. 
On the other hand, execution of the proposed method is tractable with respect 
to the database. Good scalability of the proposed method is indicated in Tabled 
because the computation times for the proposed method are smaller than those 
for the original GTRACE. The scalability of the proposed method comes from 
mining only rFTSs based on the principle of a reverse search and the efficient 
implementation as discussed in Section 4. 



6 Discussion 



In Section 14.31 we mentioned that 17 frequent sequential patterns are mined 
from s% and s 2 given by Eqs. (|3]) and ((4]), respectively. Some of the mined 
patterns from s\ and s 2 are given below. 

(iiie) = (W[M] e *[( 2 2,3) ,-])' ( 5 ) 

((iih)) = ( vi [i,A] vi [2,B])^ ( 6 ) 

((*1*2)*4> = (OT{iV^V4 2 l,2)H^ (7) 

((hi2)h) = («*[^]«*[2,B] ei [pj,3),-])' ( 8 ) 

Fourteen frequent sequential patterns, including Eqs. ([5]) to ((5}, of the 17 pat- 
terns should not be mined from the projected transformation sequences with re- 
spect to s p — (ei^i 2 ) -] e *[(2 3) -])' because the transformation sequences shown 
as Eqs. (|5|) to flSJ do not contain s p as a proper subsequence according to the 
principle of the reverse search. In addition, the transformation sequences shown 
as Eqs. ([5]), (J6)), and flS)) are not rFTSs. By converting si and s 2 to s[ and 
s' 2 , respectively, we mine only rFTSs that should be mined from the projected 
transformation subsequences. 



7 Conclusion 

In this paper, we proposed an efficient method for mining all rFTSs from a 
given set of graph sequences. We developed a graph sequence mining program, 
and confirmed the efficient and practical performance of the proposed method 
through computational experiments using artificial and real-world datasets. The 
method proposed in this paper efficiently enumerates all rFTSs from a set of 
graph sequences, whereas the methods in [31 [3] mine all frequent patterns from a 
long graph sequence. In [ISIIH], it is shown that the principle of growing possible 
patterns can be distinguished from the principle of counting support values of 
the patterns. Therefore, the proposed method in this paper can be extended to 
mine rFTSs from a long and large graph sequence based on [T5| [8] . By extending 
our method to mine from graph sequences, we plan to compare the performance 
of our method with that of Berlingerio's recently proposed method. 
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Table 4: Results for various \DB\, \V avg \, pi, \L e \, and a'. 



1 nDi 

\DB\ 


1 AAA 

1,000 


O AAA 

3,000 


f-7 AAA 

7,000 


1 A AAA 

10,000 


avg. len. 


42.9 


44.0 


43.5 


43.4 


PM comptimc 


z.U 


0.0 


1 k n 


91 A 


S of rFTSs 


6307 


6872 


6325 


6266 


f ! I POTTl T~>t 1 TTl P 


217.3 


951.2 


2226 2 


3465.3 


M= of FTSs 


1 71 787 


1 90876 


1 70,387 


i 70902 

J. I \Jtj\JZj 


1 T/ 

1 ^avp | 


6 


8 


15 


20 


avg. len. 


42.9 


61.2 


135.5 


188.7 


PM comptimc 


z.u 


O.O 


998 7 


4OO0.U 


S of rFTSs 


6307 


20755 


3653370 


81012875 


( , I r*n tyi n r i tyi p 


217.3 


3599.3 






# of FTSs 


171787 


936115 






Pi [%] 


55 


70 


80 


100 


avg. len. 


116.7 


65.0 


42.9 


18.7 


PM comptimc 


a 1 n a 

4iy.o 


7.5 


o a 
2.0 


0.7 


# of rFTSs 


4458046 


58251 


6307 


585 


GT comptime 




4132.2 


217.3 


8.5 


# of FTSs 




2355657 


171787 


9765 


\Le\ 


1 


3 


7 


10 


avg. len. 


43.5 


43.7 


43.8 


43.0 


PM comptime 


54.5 


5.1 


1.9 


0.97 


# of rFTSs 


70257 


12972 


5318 


3333 


GT comptime 


2195.9 


644.0 


205.9 


122.1 


# of FTSs J 


2995499 


474564 


132538 


62897 


[%] 


5 


7.5 


10 


15 


avg. len. 


42.9 


42.9 


42.9 


42.9 


PM comptime 


377.1 


30.7 


2.0 


1.8 


# of rFTSs 


90607156 


5744037 


6307 


1630 


GT comptime 


2122.4 


472.3 


217.3 


93.3 


# of FTSs 


176177313 


11891069 


171787 


43100 



PM: Proposed Method, GT: the original GTRACE, comptime: 
computation time [sec], avg. len.: average length of transformation 
sequences, # of rFTSs (or FTSs): the number of mined rFTSs 
(or FTSs) 
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Tabic 5: Results for the Enron datasct. 



# of persons \V\ 


100 


140 


150 


182 


PM comptime 


0.5 


2.2 


15.9 


278.6 


# of rFTSs 


33227 


31391 


47015 


1558833 


GT comptime 


16.8 


118.3 






# of FTSs 


66072 


154541 






min. sup. tr'[%] 


40 


30 


20 


10 


PM comptime 


2.0 


6.1 


30.0 


278.6 


# of rFTSs 


974 


3548 


14419 


158833 


GT comptime 


14.4 


387.3 






# of FTSs 


4129 


29253 






# of intcrstates n 


4 


5 


6 


7 


PM comptime 


5.85 


34.1 


95.6 


278.6 


# of rFTSs 


5542 


21214 


51727 


158833 


GT comptime 


423.8 








# of FTSs 


67997 









Default: minimum support a' =10%, # of vertex labels \L V \ — 8, 
# of edge labels \L e \ = 5, # of persons |V| = 182, # of interstates 
n=7. PM: Proposed Method, GT: the original GTRACE 
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